AITopics | narrated instructional video

Collaborating Authors

narrated instructional video

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

COBE: Contextualized Object Embeddings from Narrated Instructional Video

Neural Information Processing SystemsDec-24-2025, 10:39:21 GMT

Many objects in the real world undergo dramatic variations in visual appearance. For example, a tomato may be red or green, sliced or chopped, fresh or fried, liquid or solid. Training a single detector to accurately recognize tomatoes in all these different states is challenging. On the other hand, contextual cues (e.g., the presence of a knife, a cutting board, a strainer or a pan) are often strongly indicative of how the object appears in the scene. Recognizing such contextual cues is useful not only to improve the accuracy of object detection or to determine the state of the object, but also to understand its functional properties and to infer ongoing or upcoming human-object interactions.

contextualized object embedding, name change, narrated instructional video, (7 more...)

Neural Information Processing Systems

Industry:

Education > Educational Technology > Media (0.45)
Education > Educational Technology > Audio & Video (0.45)

Technology: Information Technology > Artificial Intelligence (0.76)

Add feedback

Review for NeurIPS paper: COBE: Contextualized Object Embeddings from Narrated Instructional Video

Neural Information Processing SystemsJan-27-2025, 13:40:42 GMT

While this algorithm is specifically designed for detectors, Miech et al 2019 used unsupervised NCE losses (much like the ones in this paper) in order to understand the natural language descriptions associated with videos; the algorithm presented here seems like the most straightforward extension of this idea to bounding boxes. Little attention is given to demonstrating that the use of bounding boxes fundamentally changes the problem. Update The rebuttal addresses the following point regarding the accuracy of the evaluation. I had misunderstood the annotations that are available with epic kitchens, and therefore I am changing my review. I would encourage the authors to clarify the writing regarding what's available with epic kitchens.

benchmark, contextualized object embedding, narrated instructional video, (9 more...)

Neural Information Processing Systems

Genre: Instructional Material > Course Syllabus & Notes (0.40)

Industry: